Search CORE

158 research outputs found

Mason – A Read Simulator for Second Generation Sequencing Data

Author: Holtgrewe M.
Publication venue
Publication date: 01/01/2010
Field of study

We present a read simulator software for Illumina, 454 and Sanger reads. Its features include position specific error rates and base quality values. For Illumina reads, we give a comprehensive analysis with empirical data for the error and quality model. For the other technologies, we use models from the literature. It has been written with performance in mind and can sample reads from large genomes. The C++ source code is extensible, and freely available under the GPL/LGPL

Institutional Repository of the Freie Universität Berlin

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

HLA-MA: Simple yet powerful matching of samples using HLA typing results

Author: Beule D.
Holtgrewe M.
Messerschmidt C.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 24/08/2016
Field of study

We propose the simple method HLA-MA for consistency checking in pipelines operating on human HTS data. The method is based on the HLA typing result of the state-of-the-art method OptiType. Provided that there is sufficient coverage of the HLA loci, comparing HLA types allows for simple, fast, and robust matching of samples from whole genome, exome, and RNA-seq data. This approach is reliable for sample re-identification even for samples with high mutational loads, e.g., caused by microsatellite instability or POLE1 defects

MDC Repository

RazerS 3: Faster, fully sensitive read mapping

Author: Holtgrewe M.
Reinert K.
Weese D.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/08/2012
Field of study

Motivation: During the last years NGS sequencing has become a key technology for many applications in the biomedical sciences. Throughput continues to increase and new protocols provide longer reads than currently available. In almost all applications, read mapping is a first step. Hence, it is crucial to have algorithms and implementations that perform fast, with high sensitivity, and are able to deal with long reads and a large absolute number of indels. Results: RazerS is a read mapping program with adjustable sensitivity based on counting q-grams. In this work we propose the successor RazerS 3 which now supports shared-memory parallelism, an additional seed-based filter with adjustable sensitivity, a much faster, banded version of the Myers’ bit-vector algorithm for verification, memory saving measures and support for the SAM output format. This leads to a much improved performance for mapping reads, in particular long reads with many errors. We extensively compare RazerS 3 with other popular read mappers and show that its results are often superior to them in terms of sensitivity while exhibiting practical and often competetive run times. In addition, RazerS 3 works without a precomputed index. Availability and Implementation: Source code and binaries are freely available for download at http://www.seqan.de/projects/razers. RazerS 3 is implemented in C++ and OpenMP under a GPL license using the SeqAn library and supports Linux, Mac OS X, and Windows

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Digestiflow: from BCL to FASTQ with ease

Author: Beule D.
Holtgrewe M.
Messerschmidt C.
Nieminen M.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/03/2020
Field of study

Management of raw-sequencing data and its pre-processing (conversion into sequences and demultiplexing) remains a challenging topic for groups running sequencing devices. They face many challenges in such efforts and solutions ranging from manual management of spreadsheets to very complex and customized laboratory information management systems handling much more than just sequencing raw data. In this article, we describe the software package DigestiFlow that focuses on the management of Illumina flow cell sample sheets and raw data. It allows for automated extraction of information from flow cell data and management of sample sheets. Furthermore, it allows for the automated and reproducible conversion of Illumina base calls to sequences and the demultiplexing thereof using bcl2fastq and Picard Tools, followed by quality control report generation. Availability and implementation: The software is available under the MIT license at https://github.com/bihealth/digestiflow-server. The client software components are available via Bioconda

Crossref

MDC Repository

AltamISA: a Python API for ISA-Tab files

Author: Beule D.
Holtgrewe M.
Kirwan J.
Kuhring M.
Nieminen M.
Publication venue: 'The Open Journal'
Publication date: 20/08/2019
Field of study

MDC Repository

SCelVis: exploratory single cell data analysis on the desktop and in the cloud

Author: Beule D.
Holtgrewe M.
Messerschmidt C.
Nieminen M.
Obermayer B.
Publication venue: 'PeerJ'
Publication date: 19/02/2020
Field of study

BACKGROUND: Single cell omics technologies present unique opportunities for biomedical and life sciences from lab to clinic, but the high dimensional nature of such data poses challenges for computational analysis and interpretation. Furthermore, FAIR data management as well as data privacy and security become crucial when working with clinical data, especially in cross-institutional and translational settings. Existing solutions are either bound to the desktop of one researcher or come with dependencies on vendor-specific technology for cloud storage or user authentication. RESULTS: To facilitate analysis and interpretation of single-cell data by users without bioinformatics expertise, we present SCelVis, a flexible, interactive and user-friendly app for web-based visualization of pre-processed single-cell data. Users can survey multiple interactive visualizations of their single cell expression data and cell annotation, define cell groups by filtering or manual selection and perform differential gene expression, and download raw or processed data for further offline analysis. SCelVis can be run both on the desktop and cloud systems, accepts input from local and various remote sources using standard and open protocols, and allows for hosting data in the cloud and locally. We test and validate our visualization using publicly available scRNA-seq data. METHODS: SCelVis is implemented in Python using Dash by Plotly. It is available as a standalone application as a Python package, via Conda/Bioconda and as a Docker image. All components are available as open source under the permissive MIT license and are based on open standards and interfaces, enabling further development and integration with third party pipelines and analysis components. The GitHub repository is https://github.com/bihealth/scelvis

MDC Repository

SCelVis: Powerful explorative single cell data analysis on the desktop and in the cloud

Author: Beule D.
Holtgrewe M.
Messerschmidt C.
Nieminen M.
Obermayer B.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 24/07/2019
Field of study

Background: Single cell omics technologies present unique opportunities for biomedical and life sciences from lab to clinic, but the high dimensional nature of such data poses challenges for computational analysis and interpretation. Furthermore, FAIR data management as well as data privacy and security become crucial when working with clinical data, especially in cross-institutional and translational settings. Existing solutions are either bound to the desktop of one researcher or come with dependencies on vendor-specific technology for cloud storage or user authentication. Results: To facilitate analysis and interpretation of single-cell data by users without bioinformatics expertise, we present SCelVis, a flexible, interactive and user-friendly app for web-based visualization of pre-processed single-cell data. Users can survey multiple interactive visualizations of their single cell expression data and cell annotation, and download raw or processed data for further offline analysis. SCelVis can be run both on the desktop and cloud systems, accepts input from local and various remote sources using standard and open protocols, and allows for hosting data in the cloud and locally. Methods: SCelVis is implemented in Python using Dash by Plotly. It is available as a standalone application as a Python package, via Conda/Bioconda and as a Docker image. All components are available as open source under the permissive MIT license and are based on open standards and interfaces, enabling further development and integration with third party pipelines and analysis components. The GitHub repository is https://github.com/bihealth/scelvis

MDC Repository

Identification and ranking of recurrent neo-epitopes in cancer

Author: Beule D.
Blanc E.
Blankenstein T.
Dhamodaran A.
Holtgrewe M.
Messerschmidt C.
Willimsky G.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 10/08/2018
Field of study

Neo-epitopes are emerging as attractive targets for cancer immunotherapy and new strategies for rapid identification of relevant candidates have become a priority. We propose a method for in silico selection of candidates which have a high potential for neo-antigen generation and are likely to appear in multiple patients. This is achieved by carefully screening 33 TCGA data sets for recurrent somatic amino acid exchanges and, for the 1,055 resulting recurrent variants, applying MHC class I binding prediction algorithms. A preliminary confirmation of epitope binding and recognition by CD8 T cells has been carried out for a couple of candidates in humanized mice. Recurrent neo-epitopes may be suitable to supplement existing personalized T cell treatment approaches with precision treatment options

MDC Repository

Identification and ranking of recurrent neo-epitopes in cancer

Author: Beule D.
Blanc E.
Blankenstein T.
Dhamodaran A.
Holtgrewe M.
Messerschmidt C.
Willimsky G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/11/2019
Field of study

BACKGROUND: Immune escape is one of the hallmarks of cancer and several new treatment approaches attempt to modulate and restore the immune system’s capability to target cancer cells. At the heart of the immune recognition process lies antigen presentation from somatic mutations. These neo-epitopes are emerging as attractive targets for cancer immunotherapy and new strategies for rapid identification of relevant candidates have become a priority. METHOS: We carefully screen TCGA data sets for recurrent somatic amino acid exchanges and apply MHC class I binding predictions. RESULTS: We propose a method for in silico selection and prioritization of candidates which have a high potential for neo-antigen generation and are likely to appear in multiple patients. While the percentage of patients carrying a specific neo-epitope and HLA-type combination is relatively small, the sheer number of new patients leads to surprisingly high reoccurence numbers. We identify 769 epitopes which are expected to occur in 77629 patients per year. CONCLUSION: While our candidate list will definitely contain false positives, the results provide an objective order for wet-lab testing of reusable neo-epitopes. Thus recurrent neo-epitopes may be suitable to supplement existing personalized T cell treatment approaches with precision treatment options

MDC Repository